Scalable Collection of Large MPI Traces on Red Storm

نویسنده

  • Rolf Riesen
چکیده

Gathering large MPI traces and statistics is important for performance analysis and trouble shooting of applications. Traces, with detailed information about each single message an application has sent, are crucial to characterize the message passing behavior of an application. On massively parallel systems like Red Storm the amount of data collected impacts the performance and behavior of the application and is therefore not feasible. We present a new tool to enable the scalable collection of large amounts of data on Red Storm class systems1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we con...

متن کامل

Implementation of Open MPI on Red Storm

The Open MPI project provides a high quality MPI implementation available on a wide variety of platforms. This technical report describes the porting of Open MPI to the Cray Red Storm platform (as well as the Cray XT3 platform). While alpha quality, the port already provides acceptable performance. Remaining porting work and future enhancements are also discussed.

متن کامل

A Comparison of Three MPI Implementations for Red Storm

Cray Red Storm is a new distributed memory massively parallel computing platform designed to scale to tens of thousands of nodes. Red Storm has a custom network designed around the Cray SeaStar network interface and router. In this paper, we present an evaluation of three different MPI implementations for Red Storm: the vendor-supported MPICH2 implementation, and two other implementations based...

متن کامل

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing

We contribute an approach that provides orders of magnitude smaller, if not near-constant size, communication traces regardless of the number of nodes while preserving structural information. We introduce intraand inter-node compression techniques of MPI events that are capable of extracting an application’s communication structure. We further present a replay mechanism for the traces generated...

متن کامل

Application Performance on the Tri-Lab Linux Capacity Cluster - TLCC

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is representative of the trend in HPC architectures. This chapter examines application performance on TLCC contrasting them with Red Storm/Cray XT4. TLCC and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007